Some Cache Optimization with Enhanced Pipeline Scheduling
نویسندگان
چکیده
Reducing the data cache stalls is getting more important as the gap between the CPU speed and the memory speed continues to grow. Many compiler-based techniques including prefetches have been proposed to mitigate the problem for numerical loops, but they are not very applicable to nonnumerical integer loops. One simple idea applicable even to those loops is separating cache-missing loads and their uses by as many as the cache miss penalties by scheduling the loads earlier. Unfortunately, naïve code motion of those loads would not work since they are likely to be stuck at the loop boundary before being separated enough from their uses. Moreover, moving the loads requires other instructions defining their operands to move as well, which would also block separation. In order to overcome these limitations, we propose separating them even across loop backedges and complex control flows. This can be implemented by employing the code motion techniques of enhanced pipeline scheduling, which can allow such separation. Our experimental results on the Itanium and the Open-64 show that the proposed technique can reduce the stalls, increasing the performance tangibly, for some integer benchmarks that suffer from the data cache misses seriously.
منابع مشابه
An Improved Optimization Model for Scheduling of a Multi-Product Tree-Like Pipeline
In the petroleum supply chain, oil refined products are often delivered to distribution centers by pipelines since they provide the most reliable and economical mode of transportation over large distances. This paper addresses the optimal scheduling of a complex pipeline network with multiple branching lines. The main challenge is to find the optimal sequence and time of product injections/deli...
متن کاملModulo Scheduling with Cache Reuse Information
Instruction scheduling in general, and software pipelining in particular face the di cult task of scheduling operations in the presence of uncertain latencies. The largest contributor to these uncertain latencies is the use of cache memories required to provide adequate memory access speed in modern processors. Scheduling for instruction-level parallel architectures with nonblocking caches usua...
متن کاملCache Pattern with Multi-Queries
82 Abstract—This article proposes a cache pattern with multiqueries and describes the multi-query optimization with scheduling, caching and pipelining A set of cache patterns is derived from a set of class of multiqueries that are loaded into the cache. Each cache pattern represents a unique equivalence class in the set of patterns. The multi-query optimization with scheduling, caching and pipe...
متن کاملCache and Pipeline Sensitive Fixed Priority Scheduling for Preemptive Real-Time Systems
Current schedulability analyses for preemptive systems consider cache behaviour by adding preemption caused cache reload costs. Thereby, they ignore the fact that delays due to cache misses often have a reduced impact because of pipeline effects. In this paper, these methods are called isolated. Pipeline-related preemption costs are not considered at all in current schedulability analyses. This...
متن کاملPrecise Instruction Scheduling
Pipeline depths in high performance dynamically scheduled microprocessors are increasing steadily. In addition, level 1 caches are shrinking to meet latency constraints but more levels of cache are being added to mitigate this performance impact. Moreover, the growing schedule-toexecute-window of deeply pipelined processors has required the use of speculative scheduling techniques. When these e...
متن کامل